-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Expand type compatibility for match function and operator #117555
ESQL: Expand type compatibility for match function and operator #117555
Conversation
… with the appropriate object
# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/physical/local/PushFiltersToSource.java
Documentation preview: |
@@ -86,7 +86,7 @@ public static List<Object[]> readURLSpec(URL source, Parser parser) throws Excep | |||
lineNumber++; | |||
} | |||
if (testName != null) { | |||
throw new IllegalStateException("Read a test without a body at the end of [" + fileName + "]."); | |||
throw new IllegalStateException("Read a test [" + testName + "] without a body at the end of [" + fileName + "]."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this change, but useful to pinpoint an error in a CSV test
*/ | ||
public final String queryAsText() { | ||
public final Object queryAsObject() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Match query needs an object for it to work correctly on specific types. Changing the signature of this function to return an object
Object queryAsObject = query().fold(); | ||
if (queryAsObject instanceof BytesRef bytesRef) { | ||
return bytesRef.utf8ToString(); | ||
} else if (query().dataType() == DataType.UNSIGNED_LONG) { | ||
return NumericUtils.unsignedLongAsBigInteger((Long) queryAsObject); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsigned long needs to be converted to BigInteger for match function to work correctly
@@ -252,8 +251,6 @@ static boolean canPushToSource(Expression exp, LucenePushdownPredicates lucenePu | |||
&& Expressions.foldable(cidrMatch.matches()); | |||
} else if (exp instanceof SpatialRelatesFunction spatial) { | |||
return canPushSpatialFunctionToSource(spatial, lucenePushdownPredicates); | |||
} else if (exp instanceof Match mf) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These conditions are checked in validate() / type resolution in Match, so we can simplify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this is fine to remove. I'm also not sure that it was ever really necessary? Since it seems that the field and type were already checked in Match even before this change - which is fine, just trying to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was unnecessary once we added the Validatable
interface, where additional checks can be done once the fields have been resolved.
throw new IllegalArgumentException("Unsupported type in tests: " + dataType); | ||
} | ||
Object value = EsqlTestUtils.randomLiteral(dataType).value(); | ||
if (value instanceof BytesRef bytesRef) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a bit tricky to get the correct base object from the randomLiteral
function, but I wanted to reuse it for calculating random values for specific data types
@elasticmachine run buildkite/docs-build-pr |
unsigned_long | long | boolean | ||
unsigned_long | unsigned_long | boolean | ||
version | keyword | boolean | ||
version | version | boolean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks great - it's exactly what we need - is this the exact table that gets generated when we run the match tests or did you have to do any changes?
I haven't looked closely at how this gets generated, but I assume that because we have the checkParamCompatibility
check then we only get the valid combinations here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is automatically generated from the MatchTests
, based on the parameters provided as part of the parameters()
method.
The test generates the positive test cases, and then uses the errorsForCasesWithoutExamples
method to add failing tests for the remaining combinations. These failing tests include the tests that will fail the checkParamCompatibility
condition.
} | ||
|
||
if (fieldType.isNumeric() && queryType.isNumeric()) { | ||
// When doing an unsigned long query, field must be an unsigned long |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an explicit error returned - see here.
We only throw an error for queries that are unsigned long when the field is NOT an unsigned long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @carlosdelest. The normal workflow works pretty well. I have a mixed data type scenario, should we support it or return an error message ? I'll leave it to you to decide how to address it, as I'm not sure how the match
query handles mixed data types. Here are the steps to reproduce it:
+ curl -u elastic:password -X DELETE 'localhost:9200/idx*/'
+ curl -u elastic:password -X PUT 'localhost:9200/idx001?pretty' -H 'Content-Type: application/json' '-d
{
"mappings": {
"properties": {
"numericfield": {
"type": "integer"
},
"stringfield": {
"type": "text"
}
}
}
}
'
+ curl -u elastic:password -X PUT 'localhost:9200/idx002?pretty' -H 'Content-Type: application/json' '-d
{
"mappings": {
"properties": {
"numericfield": {
"type": "long"
},
"stringfield": {
"type": "keyword"
}
}
}
}
'
+ curl -X PUT 'localhost:9200/idx001/_bulk?refresh&pretty' -H 'Content-Type: application/json' '-d
{"index": {}}
{"numericfield": "1", "stringfield": "idx001"}
'
+ curl -X PUT 'localhost:9200/idx002/_bulk?refresh&pretty' -H 'Content-Type: application/json' '-d
{"index": {}}
{"numericfield": "2", "stringfield": "idx002"}
'
+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
"query": "from idx001 | WHERE match(numericfield, 1)"
}
'
numericfield | stringfield
---------------+---------------
1 |idx001
empty result
============
+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
"query": "from idx002 | WHERE match(numericfield, 1)"
}
'
numericfield | stringfield
---------------+---------------
empty result
============
+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
"query": "from idx001, idx002 | WHERE match(numericfield::int, 1)"
}
'
numericfield | stringfield
---------------+---------------
@@ -78,7 +78,7 @@ regexBooleanExpression | |||
; | |||
|
|||
matchBooleanExpression | |||
: fieldExp=qualifiedName COLON queryString=constant | |||
: fieldExp=qualifiedName COLON matchQuery=constant (CAST_OP dataType)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is my understanding correct that match
always works with literal strings no matter what the field data type is? If this is true, adding explicit casting does seem unnecessary, to be honest, I'm confused by :
+ ::
when I saw it the first time.
I couldn't think of a case that explicit casting is needed for match function or operator, please correct me if I'm wrong. Supporting explicit casting in the matchQuery
may give people impressions that we support an expression there, however it is not true. If it is not necessary, shall we just keep it as simple as a constant?
I have a weird case, if we don't have to support explicit casting here, we don't have to deal with this.
+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
"query": "from sample_data | WHERE match(event.duration, \"725448\"::time_duration)"
}
'
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "argument of [\"725448\"] must be a constant, received [\"725448\"]"
}
],
"type" : "illegal_argument_exception",
"reason" : "argument of [\"725448\"] must be a constant, received [\"725448\"]"
},
"status" : 400
}
I see, this is interesting. The problem lies on the ES field being mapped as a We can remove that restriction by checking that we have a EVAL would be needed to retrieve the actual values, similar to how multi index works now. So for example:
would retrieve the following:
The alternative is to detect that at the verification level and issue an error stating that match cannot be used for querying fields with different data types. As I think allowing this is a better option, I'll start working on it - LMK if you think this should not be allowed. |
Multi-typed scenarios are tricky, and I hit a problem with it before on my own PR. For the example above, idx001 return 1 record, idx002 returns 0 record, I was expecting 1 record return when querying both indices in the same query(like how the match query behave below), or get an error message says it is not supported.
|
… queries with different types
@fang-xing-esql , I've tried to add multi-index, multi-type support in 347b3ce . I've added some CSV tests for that so you can see what it looks like. LMKWYT! |
if (field instanceof FieldAttribute fieldAttribute) { | ||
String fieldName = fieldAttribute.name(); | ||
if (fieldAttribute.field() instanceof MultiTypeEsField multiTypeEsField) { | ||
// If we have multiple field types, we allow the query to be done, but getting the underlying field name | ||
fieldName = multiTypeEsField.getName(); | ||
} | ||
return new MatchQuery(match.source(), fieldName, match.queryAsObject()); | ||
} | ||
|
||
throw new IllegalArgumentException("Match must have a field attribute as the first argument"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @carlosdelest ! I'm on the fence about whether keeping these(defecting the real field name and throw IllegalArgumentException
) on the data node or on the coordinator node(inside the Match
function itself), so that we can keep MatchFunctionTranslator
clean here, throw new IllegalArgumentException
on data node caught my eyes. Is this something can be done safely on the coordinator node?
I have tried this, and the tests related to multi-typed fields that I can think of return as expected. I'd suggest to have more tests to cover more data types besides int
and long
with multi-type fields as part of the regression tests - double
, unsigned_long
, boolean
, date
, strings
, without having them in the regression tests it is hard to tell when things change. semantic_text
will be supported by match
soon, depending on which PR merges first, we should have tests to cover it as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on the fence about whether keeping these(defecting the real field name and throw IllegalArgumentException) on the data node or on the coordinator node(inside the Match function itself), so that we can keep MatchFunctionTranslator clean here, throw new IllegalArgumentException on data node caught my eyes. Is this something can be done safely on the coordinator node?
Hey @fang-xing-esql ! I added the IAE just to signal something broke in case we got there. We can't actually get to that point as this is already being validated in the validate()
method for Match
, which is being invoked at the coordinator level.
Having the IAE will help us notice in case the validate()
method is not doing what it should. I can add some more info on the exception / comment if you think that's necessary.
. I'd suggest to have more tests to cover more data types besides int and long with multi-type fields as part of the regression tests - double, unsigned, boolean, strings, without having them in the regression tests it is hard to tell when things change
I understand. As I already created a conflicting mapping with employees
, I will add some more tests to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fang-xing-esql I added some tests in b182a87, LMKWYT!
I needed to fix some tests that depended on the basic mapping defaults in fa14ff5
Thanks for adding more tests @carlosdelest, the coverage looks much better. Besides the explicit casting on the I came across one scenario when experimenting with multi-type fields, it is related to runtime error handling, it could be an edge case, it doesn't look very uncommon though.
|
…s provided in the query
@fang-xing-esql , that is due to match query behaviour. The good news, it can be controlled by the I've added leniency to match queries and a test for this behaviour in 9060cba .
I will remove it on Monday if @ioanatia doesn't have any comments to that 👍 |
# Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/EsqlTestUtils.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/physical/local/PushFiltersToSource.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/EsqlExpressionTranslators.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizerTests.java
@fang-xing-esql I removed the possibility to cast the query to a specific type in e723ae6 . I think this and the leniency work above should address your remaining concerns - LMKWYT! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @carlosdelest ! LGTM.
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
…#117555) (#118297) * Fix and unmute synonyms tests using timeout (#117486) (cherry picked from commit 930a99c) # Conflicts: # muted-tests.yml # rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/synonyms/10_synonyms_put.yml # rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/synonyms/110_synonyms_invalid.yml # rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/synonyms/20_synonyms_get.yml # rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/synonyms/30_synonyms_delete.yml # rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/synonyms/40_synonyms_sets_get.yml # rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/synonyms/50_synonym_rule_put.yml # rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/synonyms/60_synonym_rule_get.yml # rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/synonyms/70_synonym_rule_delete.yml # rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/synonyms/80_synonyms_from_index.yml # rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/synonyms/90_synonyms_reloading_for_synset.yml * Fix merge
Match function only worked on text and keyword field types.
The match query actually supports other field types:
Adding these supported types to match function and operator, so the following queries are valid:
Using casting is also possible to refine the type used:
Using a string will always do the right thing, independently of the type used: